1 


Mixed  Reality  on  a  Virtual  Globe 

Zhuming  Ai  and  Mark  A.  Livingston 

3D  Virtual  and  Mixed  Environments 
Information  Management  and  Decision  Architectures, 

Naval  Research  Laboratory 
Washington 
USA 


1.  Introduction 

Augmented  reality  (AR)  and  mixed  reality  (MR)  are  being  used  in  urban  leader  tactical 
response,  awareness  and  visualization  applications  (Livingston  et  al.,  2006;  Urban  Leader 
Tactical  Response,  Awareness  &  Visualization  (ULTRA-Vis),  n.d.).  Fixed-position  surveillance 
cameras,  mobile  cameras,  and  other  image  sensors  are  widely  used  in  security  monitoring 
and  command  and  control  for  special  operations.  Video  images  from  video  see-through  AR 
display  and  optical  tracking  devices  may  also  be  fed  to  command  and  control  centers.  The 
ability  to  let  the  command  and  control  center  have  a  view  of  what  is  happening  on  the  ground 
in  real  time  is  very  important  for  situation  awareness.  Decisions  need  to  be  made  quickly 
based  on  a  large  amount  of  information  from  multiple  image  sensors  from  different  locations 
and  angles.  Usually  video  streams  are  displayed  on  separate  screens.  Each  image  is  a  2D 
projection  of  the  3D  world  from  a  particular  position  at  a  particular  angle  with  a  certain  field 
of  view.  The  users  must  understand  the  relationship  among  the  images,  and  recreate  a  3D 
scene  in  their  minds.  It  is  a  frustrating  process,  especially  when  it  is  a  unfamiliar  area,  as  may 
be  the  case  for  tactical  operations. 

AR  is,  in  general,  a  first-person  experience.  It  is  the  combination  of  real  world  and 
computer-generated  data  from  the  user's  perspective.  For  instance,  an  AR  user  might  wear 
translucent  goggles;  through  these,  he  can  see  the  real  world  as  well  as  computer-generated 
images  projected  on  top  of  that  world  (Azuma,  1997).  In  some  AR  applications,  such  as  the 
battle  field  situation  awareness  AR  application  and  other  mobile  outdoor  AR  applications 
(Hollerer  et  al.,  1999;  Piekarski  &  Thomas,  2003),  it  is  useful  to  let  a  command  and  control 
center  monitor  the  situation  from  a  third-person  perspective. 

Our  objective  is  to  integrate  geometric  information,  georegistered  image  information,  and 
other  georeferenced  information  into  one  mixed  environment  that  reveals  the  geometric 
relationship  among  them.  The  system  can  be  used  for  security  monitoring,  or  by  a  command 
and  control  center  to  direct  a  field  operation  in  an  area  where  multiple  operators  are  engaging 
in  a  collaborative  mission,  such  as  a  SWAT  team  operation,  border  patrol,  security  monitoring, 
etc.  It  can  also  be  used  for  large  area  intelligence  gathering  or  global  monitoring.  For  outdoor 
MR  applications,  geographic  information  systems  (GIS)  or  virtual  globe  systems  can  be  used 
as  platforms  for  such  a  purpose. 
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2.  Related  work 

On  the  reality-virtuality  continuum  (Milgram  et  al.,  1995),  our  work  is  close  to  augmented 
virtuality,  where  the  real  world  images  are  dynamically  integrated  into  the  virtual  world  in 
real  time  (Milgram  &  Kishino,  1994).  This  project  works  together  closely  with  our  AR  situation 
awareness  application,  so  it  will  be  referred  as  a  MR  based  application  in  this  paper. 

Although  projecting  real  time  images  on  top  of  3D  models  has  been  widely  practiced 
(Hagbi  et  al.,  2008),  and  there  are  some  attempts  on  augmenting  live  video  streams  for 
remote  participation  (Wittkamper  et  al.,  2007)  and  remote  videoconferencing  (Regenbrecht 
et  al.,  2003),  no  work  on  integrating  georegistered  information  on  a  virtual  globe  for  MR 
applications  has  been  found. 

Google  Earth  has  been  explored  for  AR/MR  related  applications  to  give  "remote  viewing" 
of  geo-spatial  information  (Frohlich  et  al.,  2006)  and  urban  planning  (Phan  &  Choo,  2010). 
Keyhole  Markup  Language  (KML)  files  used  in  Google  Earth  have  been  used  for  defining  the 
augmented  object  and  its  placement  (Honkamaa,  2007).  Different  interaction  techniques  are 
designed  and  evaluated  for  navigating  Google  Earth  (Dubois  et  al.,  2007). 

The  benefit  of  the  third-person  perspective  in  AR  was  discussed  in  (Salamin  et  al.,  2006). 
They  found  that  the  third-person  perspective  is  usually  preferred  for  displacement  actions 
and  interaction  with  moving  objects.  It  is  mainly  due  to  the  larger  field  of  view  provided  by 
the  position  of  the  camera  for  this  perspective.  We  believe  that  our  AR  applications  can  also 
benefit  from  their  findings. 

There  are  some  studies  of  AR  from  the  third-person  view  in  gaming.  To  avoid  the  use  of 
expensive,  delicate  head-mounted  displays,  a  dice  game  in  a  third-person  AR  was  developed 
(Colvin  et  al.,  2003).  The  user- tests  found  that  players  have  no  problem  adapting  to  the 
third-person  screen.  The  third-person  view  was  also  used  as  an  interactive  tool  in  a  mobile 
AR  application  to  allow  users  to  view  the  contents  from  points  of  view  that  would  normally 
be  difficult  or  impossible  to  achieve  (Bane  &  Hollerer,  2004). 

AR  technology  has  been  used  together  with  GIS  and  virtual  globe  systems  (Hugues  et  al., 
2011).  A  GIS  system  has  been  used  to  work  with  AR  techniques  to  visualize  landscape 
(Ghadirian  &  Bishop,  2008).  A  handheld  AR  system  has  been  developed  for  underground 
infrastructure  visualization  (Schall  et  al.,  2009).  A  mobile  phone  AR  system  tried  to  get  content 
from  Google  Earth  (Henrysson  &  Andel,  2007). 

The  novelty  of  our  approach  lies  in  overlaying  georegistered  information,  such  as  real  time 
images,  icons,  and  3D  models,  on  top  of  Google  Earth.  This  not  only  allows  a  viewer  to  view 
it  from  the  camera's  position,  but  also  a  third  person  perspective.  When  information  from 
multiple  sources  are  integrated,  it  provides  a  useful  tool  for  command  and  control  centers. 

3.  Methods 

Our  approach  is  to  partially  recreate  and  update  the  live  3D  scene  of  the  area  of  interest 
by  integrating  information  with  spatial  georegistration  and  time  registration  from  different 
sources  on  a  virtual  globe  in  real  time  that  can  be  viewed  from  any  perspective.  This 
information  includes  video  images  (fixed  or  mobile  surveillance  cameras,  traffic  control 
cameras,  and  other  video  cameras  that  are  accessible  on  the  network),  photos  from  high 
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altitude  sensors  (satellite  and  unmanned  aerial  vehicle),  tracked  objects  (personal  and  vehicle 
agents  and  tracked  targets),  and  3D  models  of  the  monitored  area. 

GIS  or  virtual  globe  systems  are  used  as  platforms  for  such  a  purpose.  The  freely  available 
virtual  globe  application,  Google  Earth,  is  very  suitable  for  such  an  application,  and  was  used 
in  our  preliminary  study  to  demonstrate  the  concept. 

The  target  application  for  this  study  is  an  AR  situation  awareness  application  for  military 
or  public  security  uses  such  as  battlefield  situation  awareness  or  security  monitoring. 
An  AR  application  that  allows  multiple  users  wearing  a  backpack-based  AR  system  or 
viewing  a  vehicle  mounted  AR  system  to  perform  different  tasks  collaboratively  has  been 
developed(Livingston  et  al.,  2006).  Fixed  position  surveillance  cameras  are  also  included  in 
the  system.  In  these  collaborative  missions  each  user's  client  sends  his/her  own  location  to 
other  users  as  well  as  to  the  command  and  control  center.  In  addition  to  the  position  of  the 
users,  networked  cameras  on  each  user's  system  can  stream  videos  back  to  the  command  and 
control  center. 

The  ability  to  let  the  command  and  control  center  have  a  view  of  what  is  happening  on  the 
ground  in  real  time  is  very  important.  This  is  usually  done  by  overlaying  the  position  markers 
on  a  map  and  displaying  videos  on  separate  screens.  In  this  study  position  markers  and  videos 
are  integrated  in  one  view.  This  can  be  done  within  the  AR  application,  but  freely  available 
virtual  globe  applications,  such  as  Google  Earth,  are  also  very  suitable  for  such  a  need  if  live 
AR  information  can  be  overlaid  on  the  globe.  It  also  has  the  advantage  of  having  satellite 
or  aerial  photos  available  at  any  time.  When  the  avatars  and  video  images  are  projected  on 
a  virtual  globe,  it  will  give  command  and  control  operators  a  detailed  view  not  only  of  the 
geometric  structure  but  also  the  live  image  of  what  is  happening. 

3.1  Georegistration 

In  order  to  integrate  the  video  images  on  the  virtual  globe,  they  first  need  to  be  georegistered 
so  that  they  can  be  projected  at  the  right  place.  The  position,  orientation,  and  field  of  view  of 
all  the  image  sensors  are  needed. 

For  mobile  cameras,  such  as  vehicle  mounted  or  head  mounted  cameras,  the  position  and 
orientation  of  the  camera  are  tracked  by  GPS  and  inertial  devices.  For  a  fixed-position 
surveillance  camera,  the  position  is  fixed  and  can  be  surveyed  with  a  surveying  tool.  A 
calibration  process  was  developed  to  correct  the  errors. 

The  field  of  view  and  orientation  of  the  cameras  may  be  determined  (up  to  a  scale  factor)  by 
a  variety  of  camera  calibration  methods  from  the  literature  (Hartley  &  Zisserman,  2004).  For 
a  pan-tilt-zoom  camera,  all  the  needed  parameters  are  determined  from  the  readings  of  the 
camera  after  initial  calibration.  The  calibration  of  the  orientation  and  the  field  of  view  is  done 
manually  by  overlaying  the  video  image  on  the  aerial  photo  images  on  Google  Earth. 

3.2  Projection 

In  general  there  are  two  kind  of  georegistered  objects  that  need  to  be  displayed  on  the  virtual 
globe.  One  is  objects  with  3D  position  information,  such  as  icons  representing  the  position  of 
users  or  objects.  The  other  is  2D  image  information. 
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To  overlay  iconic  georegistered  information  on  Google  Earth  is  relatively  simple.  The  AR 
system  distributes  each  user's  location  to  all  other  users.  This  information  is  converted  from 
the  local  coordinate  system  to  the  globe  longitude,  latitude,  and  elevation.  Then  an  icon  can 
be  placed  on  Google  Earth  at  this  location.  This  icon  can  be  updated  at  a  predefined  interval, 
so  that  the  movement  of  all  the  objects  can  be  displayed. 

Overlaying  the  2D  live  video  images  on  the  virtual  globe  is  complex.  The  images  need  to 
be  projected  on  the  ground,  as  well  as  on  all  the  other  objects,  such  as  buildings.  From  a 
strict  viewpoint  these  projections  couldn't  be  performed  if  not  all  of  the  3D  information  were 
known  along  the  projection  paths.  However,  it  is  accurate  enough  in  practice  to  just  project  the 
images  on  the  ground  and  the  large  objects  such  as  buildings.  Many  studies  have  been  done 
to  create  urban  models  based  on  image  sequences  (Beardsley  et  al.,  1996;  Jurisch  &  Mountain, 
2008;  Tanikawa  et  al.,  2002).  It  is  a  non-trivial  task  to  obtain  these  attributes  in  the  general  case 
of  an  arbitrary  location  in  the  world.  Automated  systems  (Pollefeys,  2005;  Teller,  1999)  are 
active  research  topics,  and  semi-automated  methods  have  been  demonstrated  at  both  large 
and  small  scales  (Julier  et  al.,  2001;  Lee  et  al.,  2002;  Piekarski  &  Thomas,  2003).  Since  it  is 
difficult  to  recreate  3D  models  in  real  time  with  few  images,  the  images  on  known  3D  models 
are  projected  instead  at  least  in  the  early  stages  of  the  study. 

To  display  the  images  on  Google  Earth  correctly,  the  projected  texture  maps  on  the  ground 
and  the  buildings  are  created.  This  requires  the  projected  images  and  location  and  orientation 
of  the  texture  maps.  An  OpenSceneGraph  (OpenSceneGraph,  n.d.)  based  rendering  program  is 
used  to  create  the  texture  maps  in  the  frame-buffer.  This  is  done  by  treating  the  video  image 
as  a  rectangle  with  texture.  The  rectangle's  position  and  orientation  are  calculated  from  the 
camera's  position  and  orientation.  When  viewing  from  the  camera  position  and  using  proper 
viewing  and  projection  transformations,  the  needed  texture  maps  can  be  created  by  rendering 
the  scene  to  the  frame-buffer. 

The  projection  planes  are  the  ground  plane  and  the  building  walls.  This  geometric  information 
comes  from  a  database  created  for  the  target  zone.  Although  Google  Earth  has  3D  buildings 
in  many  areas,  including  our  target  zone,  this  information  is  not  available  for  Google  Earth 
users  and  thus  cannot  be  used  for  our  calculations.  Besides,  the  accuracy  of  Google  Earth  3D 
buildings  various  from  places  to  places.  Our  measurements  show  that  our  database  is  much 
more  accurate  in  this  area. 

To  create  the  texture  map  of  the  wall,  an  asymmetric  perspective  viewing  volume  is  needed. 
The  viewing  direction  is  perpendicular  to  the  wall  so  when  the  video  image  is  projected  on  the 
wall,  the  texture  map  can  be  created.  The  viewing  volume  is  a  frustum  of  a  pyramid  which  is 
formed  with  the  camera  position  as  the  apex,  and  the  wall  (a  rectangle)  as  the  base. 

When  projecting  on  the  ground,  the  area  of  interest  is  first  divided  into  grids  of  proper  size. 
When  each  rectangular  region  of  the  grid  is  used  instead  of  the  wall,  the  same  projection 
method  for  the  wall  described  above  can  be  used  to  render  the  texture  map  in  the  frame-buffer. 

The  position  and  size  of  the  rectangular  region  are  changing  when  the  camera  moves  or 
rotates,  the  resolution  of  the  texture  map  is  kept  roughly  the  same  as  the  video  image 
regardless  of  the  size  of  the  region,  so  that  the  details  of  the  video  image  can  be  maintained 
while  the  memory  requirement  is  kept  at  a  minimum.  To  calculate  the  region  of  the  projection 
on  the  ground,  a  transformation  matrix  is  needed  to  project  the  corners  of  the  video  image  to 
the  ground: 
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M  =  P  x  T  x  R 

where  R  and  T  are  the  rotation  and  translation  matrices  that  transform  the  camera  to  the  right 
position  and  orientation,  and  P  is  the  projection  matrix,  which  is 


d  0  0  0 
Od  0  0 
0  0  -d  0 
0  0  10 


where  d  is  the  distance  between  the  camera  and  the  projection  plane  (the  ground). 

While  the  camera  is  moving,  it  is  possible  to  keep  the  previous  textures  and  only  update  the 
parts  where  new  images  are  available.  In  this  way,  a  large  region  will  be  eventually  updated 
when  the  camera  pans  over  the  area. 

The  zooming  factor  of  the  video  camera  can  be  converted  to  the  field  of  view.  Together  with 
the  position  and  orientation  of  the  camera  that  are  tracked  by  GPS,  inertial  devices,  and 
pan-tilt  readings  from  the  camera,  we  can  calculate  where  to  put  the  video  images.  The 
position  and  size  of  the  image  can  be  arbitrary  as  long  as  it  is  along  the  camera  viewing 
direction,  with  the  right  orientation  and  a  proportional  size. 


3.3  Rendering 

The  rendering  of  the  texture  is  done  with  our  AR/MR  rendering  engine  which  is  based  on 
OpenSceneGraph.  A  two-pass  rendering  process  is  performed  to  remove  part  of  the  views 
blocked  by  the  buildings. 

In  the  first  pass,  all  of  the  3D  objects  in  our  database  are  disabled  and  only  the  camera  image 
rectangle  is  in  the  scene.  The  rendered  image  is  grabbed  from  the  frame-buffer.  Thus  a 
projected  image  of  the  video  is  obtained.  In  the  second  pass  the  camera  image  rectangle  is 
removed  from  the  scene.  The  grabbed  image  in  the  first  pass  is  used  as  a  texture  map  and 
applied  on  the  projection  plane  (the  ground  or  the  walls).  All  the  3D  objects  in  the  database 
(mainly  buildings)  are  rendered  as  solid  surfaces  with  a  predefined  color  so  that  the  part  on  the 
projection  plane  that  is  blocked  is  covered.  The  resulting  image  is  read  from  the  frame-buffer 
and  used  as  texture  map  in  Google  Earth.  A  post-processing  stage  changes  the  blocked  area 
to  transparent  so  that  the  satellite / aerial  photos  on  Google  Earth  are  still  visible. 


3.4  Google  Earth  interface 

Google  Earth  uses  KML  to  overlay  placemarks,  images,  etc.  on  the  virtual  globe.  3D  models 
can  be  built  in  Collada  format  and  displayed  on  Google  Earth.  A  Google  Earth  interface 
module  for  our  MR  system  has  been  developed.  This  module  is  an  hyper-text  transfer  protocol 
(HTTP)  server  that  sends  icons  and  image  data  to  Google  Earth.  A  small  KML  file  is  loaded 
into  Google  Earth  that  sends  update  requests  to  the  server  at  a  certain  interval,  and  updates 
the  received  icons  and  images  on  Google  Earth. 
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4.  Results 

An  information  integration  prototype  module  with  the  Battlefield  Augmented  Reality  System 
(BARS)  (Livingston  et  al.,  2004)  has  been  implemented.  This  module  is  an  HTTP  server 
implemented  in  C++  that  sends  icons  and  image  data  to  Google  Earth.  The  methods  are 
tested  in  a  typical  urban  environment.  One  user  roams  the  area  while  another  object  is  a 
fixed  pan-tilt-zoom  network  surveillance  camera  (AXIS  213  PTZ  Network  Camera)  mounted 
on  top  of  the  roof  on  a  building  by  a  parking  lot.  This  simulates  a  forward  observation  post 
in  military  applications  or  surveillance  camera  in  security  applications.  The  command  and 
control  center  is  located  at  a  remote  location  running  the  MR  application  and  Google  Earth. 
Both  the  server  module  and  Google  Earth  are  running  on  a  Windows  XP  machine  with  dual 
3.06  GHz  Intel  Xeon  CPU,  2  GB  RAM,  and  a  NVIDIA  Quadro4  900XGL  graphics  card. 


Fig.  1.  Video  image  of  the  parking  lot  and  part  of  a  building  from  a  surveillance  video 
camera  on  the  roof  top. 

The  testing  area  is  a  parking  lot  and  some  buildings  nearby.  Figure  1  is  the  video  image  from 
the  roof  top  pan-tilt-zoom  camera  when  it  is  pointing  to  the  parking  lot.  One  of  the  parking 
lot  corners  with  a  building  is  in  the  camera  view.  Another  AR  user  is  on  the  ground  of  the 
parking  lot,  the  image  captured  by  this  user  in  shown  in  Figure  2  which  shows  part  of  the 
building. 

Google  Earth  can  display  3D  buildings  in  this  area.  When  the  3D  building  feature  in  Google 
Earth  is  enabled,  the  final  result  is  shown  in  Figure  4.  The  images  are  projected  on  the 
buildings  as  well  as  on  the  ground  and  overlaid  on  Google  Earth,  together  with  the  icon  of  an 
AR  user  (right  in  the  image)  and  the  icon  representing  the  camera  on  the  roof  of  the  building 
(far  left  in  the  image).  The  parking  lot  part  is  projected  on  the  ground  and  the  building  part 
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Fig.  3.  Image  of  the  target  zone  on  Google  Earth. 
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Fig.  4.  Recreated  3D  scene  viewed  with  3D  buildings  on  Google  Earth.  The  two  field 
operator's  icons  and  the  video  image  are  overlaid  on  Google  Earth. 

(the  windows,  the  door,  and  part  of  the  walls)  is  projected  on  vertical  polygons  representing 
the  walls  of  the  building.  The  model  of  the  building  is  from  the  database  used  in  our  AR/MR 
system.  When  the  texture  was  created,  the  part  that  is  not  covered  by  the  video  image  is 
transparent  so  it  blended  into  the  aerial  image  well.  The  part  of  the  view  blocked  by  the 
building  is  removed  from  the  projected  image  on  the  ground. 

Google  Earth  supports  3D  interaction;  the  user  can  navigate  in  3D.  This  gives  the  user  the 
ability  to  move  the  viewpoint  to  any  position.  Figure  4  is  from  Google  Earth  viewed  from  an 
angle  instead  of  looking  straight  down.  This  third-person  view  is  very  suitable  in  command 
and  control  applications.  The  projected  images  are  updated  at  a  0.5  second  interval,  so  viewers 
can  see  what  is  happening  live  on  the  ground.  It  needs  to  point  out  that  the  3D  building 
information  in  Google  Earth  is  not  very  accurate  in  this  area  (especially  the  height  of  the 
buildings),  but  is  a  good  reference  for  our  study. 

The  result  shows  the  value  of  this  study  which  integrates  information  from  multiple  sources 
into  one  mixed  environment.  From  the  source  images  (Figure  1  and  Figure  2),  it  is  difficult  to 
see  how  they  are  related.  By  integrating  images,  icons,  and  3D  model  as  shown  in  Figure  4, 
it  is  very  easy  for  the  command  and  control  center  to  monitor  what  is  happening  live  on  the 
ground.  In  this  particular  position,  the  AR  user  on  the  ground  and  the  simulated  forward 


Mixed  Reality  on  a  Virtual  Globe 


11 


observation  post  on  the  roof  top  can  not  see  each  other.  The  method  can  be  integrated  into 
our  existing  AR  applications  so  that  each  on-site  user  will  be  able  to  see  live  images  from 
other  users'  video  cameras  or  fixed  surveillance  cameras.  This  will  extend  the  X-ray  viewing 
feature  of  AR  systems  by  adding  information  not  only  from  computer  generated  graphics  but 
also  live  images  from  other  users  in  the  field. 

5.  Discussion 

The  projection  errors  on  the  building  in  Figure  4  are  pretty  obvious.  There  are  several  sources 
of  errors  involved.  One  is  the  accuracy  of  the  models  of  the  buildings.  More  serious  problems 
come  from  camera  tracking,  calibration,  and  lens  distortion.  The  lens  distortion  are  not 
calibrated  in  this  study  due  to  limited  time,  which  is  probably  one  of  the  major  causes  of 
error.  This  will  be  done  in  the  near  future. 

Camera  position,  orientation,  and  field  of  view  calibration  is  another  issue.  In  our  study, 
the  roof  top  camera  position  is  fixed  and  surveyed  with  a  surveying  tool,  it  is  assumed  that 
it  is  accurate  enough  and  is  not  considered  in  the  calibration.  The  orientation  and  field  of 
view  were  calibrated  by  overlaying  the  video  image  on  the  aerial  photo  images  on  Google 
Earth.  The  moving  AR  user  on  the  ground  is  tracked  by  GPS  and  inertial  devices  which  can 
be  inaccurate.  However  in  a  feature-based  tracking  system  such  as  simultaneous  localization 
and  mapping  (SLAM)  (Durrant-Whyte  &  Bailey,  2006),  the  video  sensors  can  be  used  to  feed 
Google  Earth  and  accuracy  should  be  pretty  good  as  long  as  the  tracking  feature  is  working. 

The  prerequisite  of  projecting  the  images  on  the  wall  or  other  3D  objects  is  that  a  database 
of  the  models  of  all  the  objects  is  created  so  that  the  projection  planes  can  be  determined. 
The  availability  of  the  models  of  such  big  fixed  objects  like  buildings  are  in  general  not  a 
problem.  However  there  is  no  single  method  exist  that  can  reliably  and  accurately  create  all 
the  models.  Moving  objects  such  as  cars  or  persons  will  cause  blocked  parts  that  can  not  be 
removed  using  the  methods  that  are  used  in  this  study.  Research  has  been  done  to  detect 
moving  objects  based  on  video  images  (Carmona  et  al.,  2008).  While  in  theory  it  is  possible  to 
project  the  video  image  on  these  moving  objects,  it  is  not  really  necessary  in  our  applications. 

Google  Earth  has  3D  buildings  in  many  areas;  this  information  may  be  available  for  Google 
Earth  users  and  thus  could  be  used  for  the  calculations.  The  accuracy  of  Google  Earth  3D 
buildings  varies  from  place  to  place;  a  more  accurate  model  may  be  needed  to  get  desired 
results.  Techniques  as  simple  as  manual  surveying  or  as  complex  as  reconstruction  from 
Light  Detection  and  Ranging  (LIDAR)  sensing  may  be  used  to  generate  such  a  model.  Many 
studies  have  been  done  to  create  urban  models  based  on  image  sequences  (Beardsley  et  al., 
1996;  Jurisch  &  Mountain,  2008;  Tanikawa  et  al.,  2002).  It  is  a  non-trivial  task  to  obtain  these 
attributes  in  the  general  case  of  an  arbitrary  location  in  the  world.  Automated  systems  are  an 
active  research  topic  (Pollefeys,  2005;  Teller,  1999),  and  semi-automated  methods  have  been 
demonstrated  at  both  large  and  small  scales  (Julier  et  al.,  2001). 

6.  Future  work 

This  is  a  preliminary  implementation  of  the  concept.  Continuing  this  on-going  effort,  the 
method  will  be  improved  in  a  few  aspects.  This  includes  registration  improvement  between 
our  exiting  models  and  the  Google  Earth  images  as  well  as  the  calibration  issues  noted  above. 
The  zooming  feature  of  the  camera  has  not  been  used  yet,  which  will  require  establishing 
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a  relation  between  the  zooming  factor  and  the  field  of  view,  another  aspect  of  camera 
calibration.  Other  future  work  includes  user  studies  related  to  effectiveness  and  efficiency 
of  the  system  in  terms  of  collaboration. 

Currently  when  the  texture  map  is  updated,  the  old  texture  is  discarded,  it  is  possible  to  keep 
the  previous  textures  and  only  update  the  parts  where  new  images  are  available.  In  this  way, 
a  large  region  will  be  eventually  updated  when  the  camera  pans  over  a  larger  area. 

There  are  a  few  aspects  contributing  to  the  error  of  the  system  that  should  be  addressed  in  the 
future.  This  will  be  done  in  the  near  future. 

7.  Conclusion 

In  this  preliminary  study,  the  methods  of  integrating  georegistered  information  on  a  virtual 
globe  is  investigated.  The  application  can  be  used  for  a  command  and  control  center  to 
monitor  the  field  operation  where  multiple  AR  users  are  engaging  in  a  collaborative  mission. 
Google  Earth  is  used  to  demonstrate  the  methods.  The  system  integrates  georegistered  icons, 
live  video  streams  from  field  operators  or  surveillance  cameras,  3D  models,  and  satellite  or 
aerial  photos  into  one  MR  environment.  The  study  shows  how  the  projection  of  images  is 
calibrated  and  properly  projected  onto  an  approximate  world  model  in  real  time. 
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